Skip to content

[Rust] [Experiment] Trigonometry kernels#9313

Closed
nevi-me wants to merge 4 commits intoapache:masterfrom
nevi-me:trigonometry-kernels
Closed

[Rust] [Experiment] Trigonometry kernels#9313
nevi-me wants to merge 4 commits intoapache:masterfrom
nevi-me:trigonometry-kernels

Conversation

@nevi-me
Copy link
Contributor

@nevi-me nevi-me commented Jan 25, 2021

This is on top of #9297

I was curious if (ab)using the compute::unary kernel would perform better on slightly complex functions.

I implemented the Haversine function, which calculates the distance between two geographic coordinates.
I then benchmarked an implementation that I tried to simplify and optimise with unary kernels, vs one that I'd have to write if I couldn't use the unary kernels for things like:

  • arithmetics with scalars
  • functions that would otherwise require generating intermediate arrays (e.g. sin(x) * cos(x) would be multiply(sin(x), cos(x)))

The function that uses unary kernels for the above, is slightly faster.

I ran this on an M1 CPU, with the below options

cargo bench --bench trigonometry_kernels
cargo bench --bench trigonometry_kernels --features simd
RUSTFLAGS="-C target-cpu=native" cargo bench --bench trigonometry_kernels
RUSTFLAGS="-C target-cpu=native" cargo bench --bench trigonometry_kernels --features simd
haversine_no_unary 512  time:   [14.074 us 14.140 us 14.216 us]

haversine_unary 512     time:   [11.191 us 11.308 us 11.436 us]
haversine_no_unary_nulls 512                                                                             
                        time:   [15.902 us 15.985 us 16.083 us]

haversine_unary_nulls 512                                                                             
                        time:   [12.486 us 12.552 us 12.625 us]

The biggest benefit is from setting the RUSTFLAGS, the non-null benches go 3-10% faster.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants